Batch Mode Active Learning with Hierarchical-Structured Embedded Variance
نویسندگان
چکیده
We consider the problem of active learning when the categories are represented as a tree with leaf nodes as outputs and internal nodes as clusters of the outputs at multiple granularity. Recent work has improved the traditional techniques by moving beyond ”flat” structure through incorporation of the label hierarchy into the uncertainty measure. However, these methods have two major limitations when used. First, these methods roughly use the information in the label structure but do not take into account the training samples, which may lead to a sampling bias due to their crude approximation of the class relations. Second, none of these methods can work in a batch mode to reduce the computational time of training. We propose a batch mode active learning scheme that exploits both the hierarchical structure of the labels and the characteristics of the training data to select the most informative data for human labeling. We achieve this goal by first using an approach based on graph embedding that embeds the relationships between the labels and data points in a transformed low-dimensional space. Then, we compute uncertainty by calculating the variance among the points and the labels in the embedding space. Finally, the selection criterion is designed to construct batches and incorporate a diversity measure. Experimental results indicate that our technique achieves a notable improvement in performance over the state-of-the-art approaches.
منابع مشابه
An Optimization Based Framework for Dynamic Batch Mode Active Learning
Active learning techniques have gained popularity in reducing human effort to annotate data instances for inducing a classifier. When faced with large quantities of unlabeled data, such algorithms automatically select the salient and representative samples for manual annotation. Batch mode active learning schemes have been recently proposed to select a batch of data instances simultaneously, ra...
متن کاملNear-optimal Batch Mode Active Learning and Adaptive Submodular Optimization
Active learning can lead to a dramatic reduction in labeling effort. However, in many practical implementations (such as crowdsourcing, surveys, high-throughput experimental design), it is preferable to query labels for batches of examples to be labelled in parallel. While several heuristics have been proposed for batch-mode active learning, little is known about their theoretical performance. ...
متن کاملDiscriminative Batch Mode Active Learning
Active learning sequentially selects unlabeled instances to label with the goal of reducing the effort needed to learn a good classifier. Most previous studies in active learning have focused on selecting one unlabeled instance to label at one time while retraining in each iteration. Recently a few batch mode active learning approaches have been proposed that select a set of most informative un...
متن کاملA Batch Mode Active Learning for Networked Data
We study a novel problem of batch mode active learning for networked data. In this problem, data instances are connected with links and their labels are correlated with each other, and the goal of batch mode active learning is to exploit the link-based dependencies and node-specific content information to actively select a batch of instances to query the user for learning an accurate model to l...
متن کاملDynamic Batch Mode Active Learning via L1 Regularization
We propose a method for dynamic batch mode active learning where the batch size and selection criteria are integrated into a single formulation.
متن کامل